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Abstract 

Novel aspects of human dynamics and social interactions are inves- 
tigated by means of mobile phone data. Using extensive phone records 
resolved in both time and space, we study the mean collective behav- 
ior at large scales and focus on the occurrence of anomalous events. 
We discuss how these spatiotemporal anomalies can be described us- 
ing standard percolation theory tools. We also investigate patterns of 
calling activity at the individual level and show that the interevent 
time of consecutive calls is heavy-tailed. This finding, which has im- 
plications for dynamics of spreading phenomena in social networks, 
agrees with results previously reported on other human activities. 

1 Introduction 

Mobile phones are becoming increasingly ubiquitous throughout large por- 
tions of the world, especially in highly populated urban areas and particu- 
larly in industrialized countries, where mobile phone penetration is almost 
100%. Mobile phone providers regularly collect extensive data about the call 
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volume, calling patterns, and the location of the cellular phones of their sub- 
scribers. In order for a mobile phone to place outgoing calls and to receive 
incoming calls, it must periodically report its presence to nearby cell towers, 
thus registering its position in the geographical cell covered by one of the 
towers. Hence, very detailed information on the spatiotemporal localization 
of millions of users is contained in the extensive call records of any mobile 
phone carrier. If misused, these records - as well as similar datasets on buy- 
ing habits, e-mail usage, and web-browsing, for instance - certainly pose a 
serious threat to the privacy of the users. However, the use of privacy-safe, 
anonymized datasets represent a huge scientific opportunity to uncover the 
structure and dynamics of the social network at different levels, from the 
small-scale individual's perspective to the large-scale, collective behavior of 
the masses, with an unprecedented degree of reach and accuracy. Besides the 
inherent scientific interest of these issues, deeper insight into applications of 
great practical importance could certainly be gained. For instance, urban 
planning, public transport design, traffic engineering, disease outbreak con- 
trol, and disaster management, are some areas that will greatly benefit from 
a better understanding of the structure and dynamics of social networks [T]. 

The use of mobile phone data as a proxy for social interaction has al- 
ready proved successful in several recent investigations. Onnela et al. [21 E] 
have analyzed the structure of weighted call graphs arising from reciprocal 
calls that serve as signatures of work-, family-, leisure- or service-based rela- 
tionships. A coupling between interaction strengths and the network's local 
structure was observed, with the counterintuitive consequence that social 
networks turn out to be robust to the removal of the strong ties but fall 
apart following a phase transition if the weak ties are removed. Szabo and 
Barabasi [3] have studied social network effects in the spread of innovations, 
products and new services. They investigated different mobile phone-based 
services and found the coexistence on the same social network of two dis- 
tinct usage classes, with either very strong or very weak community-based 
segregation effects. In the context of urban studies and planning, Ratti et 
al. [51 [6] have considered the potential use of aggregated data from mobile 
phones and other hand-held devices. Their "Mobile Landscapes" project 
aims at the application of location based services to urban studies in order 
to gain insight into complex and rapidly changing urban dynamics phenom- 
ena. More recently, Palla, Barabasi and Vicsek |7| 18j used mobile phone data 
to study the evolution of social groups. They found that large groups persist 
for longer times if they are capable of dynamically altering their member- 
ship, suggesting that an ability to change the group composition results in 
better adaptability. In contrast, the behavior of small groups displays the 
opposite tendency, the condition for long-term persistence being that their 
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composition remains stable. 

In the following sections, we present new results that address novel aspects 
of human dynamics and social interactions obtained from extensive mobile 
phone data. In Sect. 2 we show how large-scale collective behavior can 
be described using aggregated data resolved in both time and space. We 
stress the importance of investigating large departures from the average and 
develop the basic framework to quantify anomalous fluctuations by means of 
standard percolation theory tools. In Sect. 3 we focus on the individual level 
and study patterns of calling activity. We show that the interevent time 
of consecutive calls is heavy-tailed, a finding that has implications for the 
dynamics of spreading on social networks [H [10], [HI |12[ |13l [TH [151 [IS IE] • 
Furthermore, by fixing the time of observation between consecutive calls it 
is possible to use the phone call data to characterize some aspects of human 
mobility. 

2 Fluctuations in aggregated spatiotemporal 
call activity patterns 

The spatial dependence of the call activity at any given time can be conve- 
niently displayed by means of maps divided in Voronoi cells, which delimit 
the area of infiuence of each transceiver tower or antenna. The Voronoi tes- 
sellation partitions the plane into polygonal regions, associating each region 
with one transceiver tower. The partition is such that all points within a 
given Voronoi cell are closer to its corresponding tower than to any other 
tower in the map. 

Figure 1 shows activity maps for aggregated data corresponding to a 1- 
hour interval. The upper panel shows the activity pattern (in logio scale) for 
a peak hour (Monday noon), while the lower panel shows the same urban 
neighborhood during an off-peak hour (Sunday at 9 am). The differences be- 
tween both panels reflect the intrinsic rhythm and pulse of the city: we can 
expect call patterns during peak hours to be dominated by the hectic activity 
around business and office areas, whereas other, presumably residential and 
leisure areas can show increased activity during off-peak times, thus lead- 
ing to different, spatially distinct activity patterns. Besides different spatial 
patterns, each particular time of the day, as well as each day of the week, 
is characterized by a different overall level of activity. This phenomenon is 
shown by the plot at the center of Figure 1, in which aggregated data for 
a country is shown as a function of time (data was binned in time intervals 
of 1 hour). As expected, the overall normalization of the aggregated pat- 
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Figure 1: Call activity maps in an urban neighborhood, showing the number 
of calls per hour managed by each transceiver tower or antenna (dots). The 
division in terms of Voronoi cells defines the area of reach of each tower. Call 
traffic patterns depend on time and day of the week, as shown by comparing 
the map on a Monday at noon (upper panel) with that on a Sunday at 9 
am (lower panel). The bars on the right side of each panel correspond to the 
number of calls per hour and tower in logio scale. 

tern is lower during weekends than during weekdays, except around weekend 
midnights and early mornings, when many people go out. 

The minimum spatial resolution is determined by either the typical dis- 
tance between towers or, in rural regions with sparse tower density, by the 
reach of the radio-frequency signals exchanged between the mobile handset 
and the antenna (typically ranging from a few hundred meters to several kilo- 
meters) . To explore activity differences at larger scales, the data of neighbor- 
ing cells can be aggregated. At the expense of some loss of spatial resolution, 
aggregating data into larger spatial bins (taking, e.g., a regular spatial grid 
covering the entire country) allows for better statistics and for a more stable 
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activity pattern. That is, the number of calls made from a group of nearby 
cells at a certain time and day of the week is expected to be fairly constant, 
except for small statistical fluctuations. 

Usually, activity patterns are strongly correlated with the daily pulse of 
populated areas (such as those shown in Fig. 1) and, at a larger scale, to 
variations in population density between different regions within the country. 
In contrast, departures from the mean expected activity are in general not 
trivially correlated with population density and describe instead interesting 
dynamical features. 

The measurement of fluctuations around the mean expected activity is of 
paramount importance, since it allows a quantitative measurement of anoma- 
lous behavior and, ultimately, of possible emergency situations. This indeed 
constitutes the base of proposed real-time monitoring tools such as the Wire- 
less Phone-based Emergency Response (WIPER) system [18]. Anomalous 
patterns indicative of a crisis (such as the occurrence of natural catastrophes 
and terrorist attacks) could be detected in real time, plotted on satellite and 
GIS-based maps of the area, and used in the immediate evaluation of mitiga- 
tion strategies, such as potential evacuation routes or barricade placement, 
by means of computer simulations flEl 119] . 

The call volume shows strong variations with time and day of the week, 
as shown in Figure 1, but differences across subsequent weeks are generally 
mild (provided one considers call traffic in the same place, time and day of 
the week). To capture the weekly periodicity of the observed patterns, we 
define ni{r,t,T) as the number of calls recorded at location r (which can 
either denote a single Voronoi cell or a group of neighboring cells) during 
the ith week between times t and t -\- T, where time is defined modulo 1 
week. Assuming we have access to continuous data for N weeks, the mean 
call activity is given by 



Note that, in the same way as one can trade off spatial resolution for in- 
creased statistics by summing over a group of Voronoi cells, varying T one 
can regulate time accuracy versus statistics. This certainly depends on the 
extent to which aggregated data shows a regular, stable behavior. The results 
presented here correspond to T = 1 hour. 

The scale to measure departures from the average behavior is set by the 
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Figure 2: Activity and fluctuations in a regular 2D grid showing a normal 
event (left panels) and an anomalous one (right panels). The activity is 
displayed in terms of the number of calls per hour inside each square bin in 
logio scale (upper panels). High-activity bins above the fluctuation threshold 
Athr = 0.25 are shown in black, while bins with normal activity are shown in 
grey (bottom panels). Bins in white correspond to areas not covered by the 
mobile phone carrier. 



standard deviation, defined as 



1 ^ 
\ i=i 



(2) 



Hence, using recorded data for an extended period of time, one can determine 
the expected call traffic levels and corresponding deviations for all times and 
locations. Once this normal behavior is established, anomalous fluctuations 
above or below a given threshold can be obtained using the condition 



\ni{r,t,T) - {n{r,t,T))\ > Athr x a(r,t,T) , 



(3) 
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Figure 3: Size of the largest cluster as a function of the fluctuation threshold 
for the normal case (left) and the anomalous one (right). Measurements on 
the call data (solid line with circles) are compared to those of randomized 
distributions, of which we show the mean (long-dashed line) and confidence 
bounds at ±ardm (short-dashed lines) and ±2ardm (dotted lines). 

where A^hr > is a constant that sets the fluctuation level. 

We grouped Voronoi cells together generating a regular 2D grid made of 
square bins of about 12 km of linear size. Considering a fixed time slice, we 
study the spatial clustering of bins showing anomalous activity at different 
fluctuation levels. In order to illustrate our procedure. Figure 2 shows the 
activity and fluctuations in a grid of size 40 x 40 bins (i.e. 480 x 480 km^ 
area) . We compare the activity in the same region for 2 different weeks (cor- 
responding to the same time and day of the week). The left panels show 
a normal event, in which fluctuations around the local mean activity are 
typically small, with just a few scattered bins having somewhat larger devia- 
tions. The right panels, however, show an anomalous event, characterized by 
extended, spatially correlated fluctuations that indicate the emergence of a 
large-scale, coordinated activity pattern. As pointed out above, the existence 
of anomalous activity patterns could be indicative of possible emergency sit- 
uations. Similarly to the Voronoi maps already discussed, the upper panels 
in Fig. 2 show the activity (number of calls per hour inside each square bin) 
in logio scale. White bins correspond to areas not covered by the mobile 
phone provider. Taking a fixed threshold value At^r — 0.25, the bottom 
panels show the high-activity bins above the fluctuation threshold (in black) 
and the bins with normal activity (in grey). Note that, although the activity 
maps have a similar appearance to the degree that they seem at flrst look 
indistinguishable, the fluctuation maps display striking differences. 



7 



400 



500 




if- 



300 



^ 200 - 



100 - 



Figure 4: Number of different clusters as a function of the fluctuation thresh- 
old for the normal case (left) and the anomalous one (right) . Measurements 
on the call data (solid line with circles) are compared to results on random 
configurations (dashed and dotted lines). 

In order to quantify the clustering of anomalous bins, we will use the 
standard tools of percolation theory and determine the size of the largest 
cluster, the number of different clusters, and the size distribution of all clus- 
ters. The statistical significance of the measured clustering is evaluated by 
comparing it to results from randomized distributions, in which many differ- 
ent configurations are randomly generated, keeping fixed the total number of 
high-activity bins above the fiuctuation threshold. The substrate, which is 
formed by all bins with non-zero activity, remains always the same (in Fig. 2, 
for instance, the substrate is the set of all grey and black bins). Clusters 
are defined by first- and second-order nearest neighbors in the square 2D 
grid. In the remainder of this section, we will focus on a specific large-scale 
anomalous event and compare it to the normal behavior observed in data of 
a different week (but corresponding to the same time and day of the week) . 
The comparison between normal and anomalous events will illustrate the use 
of percolation observables as diagnostic tools for anomaly detection. 

Figure 3 shows the size of the largest cluster, Smax^ as a function of 
the fiuctuation threshold Aj/j^, for the normal case (left) and the anoma- 
lous one (right). Each measured plot (sohd line with circles) is compared to 
results from randomized distributions. The latter correspond to the mean 
(long-dashed line) and confidence bounds at -^Ordm (short-dashed lines) and 
-^lordm (dotted lines), as obtained from generating 100 random configura- 
tions in each case. As expected, the plots show that the size of the largest 
cluster monotonically decreases with the fluctuation threshold. However, 
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Figure 5: Cumulative size distribution of all clusters as a function of cluster 
size, for Athr = 0.25 (upper panels), Afhr = 0.75 (bottom panels), normal 
case (left panels), and anomalous case (right panels). Thick solid lines are 
measurements on the call data, while dashed and dotted lines are results 
from random configurations. 



while the clustering in the normal case lacks any significance, the anoma- 
lous event shows large departures from the clustering expected in a random 
configuration. 

In the same vein. Figure 4 shows the number of different clusters, Nd, 
as a function of the fiuctuation threshold Afhr, where measurements on the 
call data for the same normal (left) and anomalous (right) events are com- 
pared to results from randomized configurations. As before, in the normal 
case the number of clusters agrees well with the expectations for random 
configurations, while significant departures are observed in the anomalous 
case. 

Figure 5 shows the cumulative size distribution of all clusters, Nci{sci > 
S), as a function of the cluster size S, compared to random configurations. 
The upper panels display results for Athr = 0.25, while the bottom ones show 
results for Athr = 0.75, as indicated. Moreover, the left panels correspond to 
the normal event, while the right panels to the anomalous event. Again, the 
measured cluster size distribution in the normal case is in good agreement 
with the expected one for a random configuration. In contrast, the anomalous 
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event shows the occurrence of a few very large clusters formed by many 
highly active bins. These unusually large structures cannot be explained as 
arising just from random configurations, but instead are the result of the 
spatiotemporal correlation of large, highly active regions. 

As a summary, in this Section we showed how large-scale collective be- 
havior can be described using aggregated data resolved in both time and 
space. Moreover, we developed the basic framework for detecting and char- 
acterizing spatiotemporal fluctuation patterns, which is based on standard 
procedures of statistics and percolation theory. These tools are particularly 
effective in detecting extended anomalous events, as those expected to occur 
in emergency scenarios due to e.g. natural catastrophes and terrorist attacks. 

3 Individual calling activity patterns 

In order to use the huge amount of data recorded by mobile phone carri- 
ers to investigate various aspects of human dynamics [H [201 HB ISSl [23], a 
necessary starting point it is to characterize the dynamics of the individual 
calling activity per se. Previous studies have measured the time between 
consecutive individual- driven events, such as sending e-mails, printing, and 
visiting web pages or the library [25] . Those events are described by 
heavy-tailed processes [20l [26] , challenging the traditional Poissonian mod- 
eling framework [23, [2H1 12HI [33 [SI] , with consequences on task completion 
in computer systems. In this section we explore the interevent distribution 
of the calling activity of 6 x 10^ mobile phone users during 1 month. 

As many other human activities, the calling activity pattern is highly 
heterogeneous. While some users rarely use the mobile phone, others make 
hundreds or even thousands of calls each month. To analyze such different 
levels of activity, we group the users based on their total number of calls. 
Within each group, we measure the probability density function P{AT) of 
the time interval AT between two consecutive calls made by each user. As 
shown by the inset of Fig. [6l the tail of the distribution is shifted to longer 
interevent times for users with less activity. However, if we plot ATaP{AT) 
as a function of AT/AT^, where AT^ is the average interevent time for the 
corresponding user, the data collapses into a single curve (Fig. [H]). This 
indicates that the measured interevent distribution follows the expression 
P(AT) = l/ATaJF(AT/ATa), where J-'{x) is independent from the average 
activity level of the population. This represents a universal characteristic of 
the system that surprinsingly also coincides with results from e-mail commu- 
nication [32]. The data are well fitted by 

P(AT) = (AT)-"exp(AT/r,), (4) 
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Figure 6: Interevent time distribution P(AT) for calling activity. AT cor- 
responds to the time interval between two mobile phone calls sent by the 
same user. Different symbols indicate the measurements done over groups of 
users with different activity levels calls). The inset shows the unsealed 
interevent time distribution and the solid line corresponds to Eq. (jl]). 



where the power law scaling with exponent a = 0.9 ±0.1 is followed by 
an exponential cutoff at Tc ~ 48 days. Equation (jl]) is shown by a solid 
line in the inset of Fig. [6] and its scaled version is presented in the main 
panel of the figure using ATa = 8.2 hours, which is the average interevent 
time measured for the whole population. This result, clearly different from 
the one predicted by a Poisson approximation ^SniESlElj, would for instance 
affect the predictions of spreading dynamics through the network of calls [33] . 

To explore the interplay between human activity and mobility patterns, 
we fix the characteristic observation time to ATq = 30 min and collect only 
those consecutive calls that occur with this interevent time, recording also 
the time of the day in which they occurred (Fig. [7] a). For each pair of 
calls, we count how many of them result in a change of coordinate, e.g. the 
user traveled in the 30 min time interval between the calls (Fig. [7]b). The 
number of events that result in a change of location and the number of calls 
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Figure 7: Travel behavior, (a)-(b) Number of trips and consecutive calls 
that are reported within a fixed interevent time ATq = 30 min vs. time of 
the day. (c) The ratio of the two quantities described in (a) and (b) shows 
that along the whole day 40 ± 20% of the people that is calling seems to 
be also traveling, (d) The average distance of travel within ATo = 30 min 
remains constant during the day within 6 ± 2 km, a reasonable value that 
may correspond to the combination between walk and motor transportation. 

as a function of time capture the daily activity pattern of the users • We 
find that both the call and the mobility pattern decrease at night and have 
clear peaks near noon and late evening. There is a factor of 30 between the 
largest and the smallest number of events (calls/changes of location) reported 
during the day. Interestingly, when we calculate the fraction of consecutive 
calls also resulting in a potential change of location, the quantity varies at 
most 40% during the whole day (Fig. [Tt). This indicates that although the 
total activity varies strongly, the percentage of the people that are calling 
and traveling remains rather stable. More importantly, the average distance 
traveled within ATq = 30 min. is stable in the vicinity of Ar = 6 ± 2 km 
(Fig. [Tli), a value consistent for the combination between walk and motor 
transportation. 
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4 Conclusions 



Novel aspects of human dynamics and social interactions were addressed by 
means of mobile phone data with time and space resolution. This allowed 
us to study the mean collective behavior at large scales and focus on the oc- 
currence of anomalous events. Considering a fixed time slice, we partitioned 
the space using a regular grid and studied the aggregated call activity inside 
each square bin forming the grid. We showed that anomalous events give rise 
to spatially extended patterns that can be meaningfully quantified in terms 
of standard percolation observables. By considering a series of consecutive 
time slices, we could investigate the rise, clustering and decay of spatially ex- 
tended anomalous events, which could be relevant e.g. in real-time detection 
of emergency situations. 

We also investigated patterns of calling activity at the individual level. 
We observed that the interevent time of consecutive calls is heavy-tailed, a 
finding that has implications for dynamics of spreading phenomena on social 
networks, and that agrees with results previously reported on other, related 
human activities. We also show that, despite of the complexity inherent in the 
interevent calling patterns, it is still possible to recover some characteristic 
values from the behavior of the population that are stationary during the 
day, such as the fraction of active traveling population and their average 
distance traveled. 

In many ways, these results represent only a first step towards understand- 
ing human activity patterns. Our results indicate that the rich information 
provided by mobile communication data open avenues to addressing novel 
problems. These tools offer a chance to improve our understanding of com- 
plex networks as well [33 EHl EHl SHI IHl 1121 SSI IS] , by potentially correlating 
the structure of social networks with the spatial layout of the users as nodes 
[l5l H6l W7\ HHl SHI EDI El] , thus contributing to a better understanding of the 
spatiotemporal features of network evolution. 
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